--- title: Audio Datasets keywords: fastai sidebar: home_sidebar summary: "This module defines datasets to use with audio data from Deep Learning applications" description: "This module defines datasets to use with audio data from Deep Learning applications" nb_path: "nbs/01audio_dataset.ipynb" ---
The data structure for the audio labels is defined by a dataframe with the columns (id, label, tmin, tmax, fmin, fmax) (see example bellow). Each row defines a bounding box of time and frequency for some label in the corresponding audio clip. Multiple boxes may exist for the same audio id.
| id | label | tmin | tmax | fmin | fmax | ... |
|---|---|---|---|---|---|---|
| a239gfdda | 10 | 2.4 | 4.1 | 5000 | 10000 | ... |
| b94k2g0as | 4 | 23.7 | 40.3 | 2500 | 7000 | ... |
%%time
path = Path('/kaggle/kaggle_rainforest_audio/data')
rename_cols = RenameColumns(id='recording_id', label='species_id', tmin='t_min',
tmax='t_max',fmin='f_min', fmax='f_max')
df = Pipeline([load_dataframe, rename_cols, group_labels])(path/'train_tp.csv')
df.head()
%%time
sample_rate, hop_length, n_mels, tile_width = 32000, 512, 128, 256
i = 15
wav = load_npy(path/'npy32000'/'train'/f'{df.loc[i].id}.npy')
wav, label = audio_crop(wav, df.loc[i], sample_rate=sample_rate, tile_width=tile_width)
plt.imshow(melspectrogram(wav, sample_rate)[0], cmap='RdYlGn_r')
plt.imshow(label, alpha=0.5, cmap='jet')
plt.show()
%%time
path = Path('/kaggle/kaggle_rainforest_audio/data')
rename_cols = RenameColumns(id='recording_id', label='species_id', tmin='t_min',
tmax='t_max',fmin='f_min', fmax='f_max')
df = Pipeline([load_dataframe, rename_cols, group_labels])(path/'train_tp.csv')
data = Datasets(items=df, tfms=partial(create_dataset_item, path=path, sample_rate=32000,
tile_width=256))
dls = DataLoader(data, bs=64, do_batch=reorganize_batch)
xb, yb = dls.one_batch()
wav, label = data[15][0][0], data[15][0][1]
plt.imshow(melspectrogram(wav, sample_rate)[0], cmap='RdYlGn_r')
plt.imshow(label, alpha=0.5, cmap='jet')
plt.show()
%%time
path = Path('/kaggle/kaggle_rainforest_audio/data')
rename_cols = RenameColumns(id='recording_id', label='species_id', tmin='t_min',
tmax='t_max',fmin='f_min', fmax='f_max')
df = Pipeline([load_dataframe, rename_cols, group_labels])(path/'train_tp.csv')
data = Datasets(items=df, tfms=partial(create_dataset_item, path=path, sample_rate=32000,
tile_width=256))
dls = DataLoader(data, bs=64, do_batch=reorganize_batch,
after_item=partial(apply_augmentations,
augs_pipeline=audio_augment(sample_rate, p=0.25)),
after_batch=MelSpectrogram(sample_rate))
xb, yb = dls.one_batch()
img, label = xb[15], yb[15]
plt.imshow(img[0], cmap='RdYlGn_r')
plt.imshow(label, alpha=0.5, cmap='jet')
plt.show()
%%time
show_augmentations(data, dls, sample_rate=32000)